Skip to content

fix(vision): scope the no-vision capability error to the latest user image#8180

Merged
Nillth merged 1 commit into
zeroclaw-labs:masterfrom
NNet-Dev:fix/vision-marker-history-poison
Jun 23, 2026
Merged

fix(vision): scope the no-vision capability error to the latest user image#8180
Nillth merged 1 commit into
zeroclaw-labs:masterfrom
NNet-Dev:fix/vision-marker-history-poison

Conversation

@Nillth

@Nillth Nillth commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Base branch: master
  • What changed and why:
    • Sending an image to a model_provider without vision support (and with no
      vision_model_provider configured) raised provider_capability_error capability=vision
      AND left the [IMAGE:] marker in the long-lived session history. Because the error was
      keyed off a history-wide marker count, every later turn (even plain text) re-counted the
      stale marker and re-failed forever. The RPC/streaming path makes it permanent: it persists
      the inbound user message into the session history before the turn loop runs, so a failed
      image turn leaves its marker behind. A single image to a non-vision provider made the
      session unusable until restart.
    • Scope the capability error to the most recent genuine user message. resolve_vision_provider
      now errors only when the user just sent an image we cannot see; a carried-over marker
      (a prior failed image turn, or a vision -> non-vision model switch mid-session) degrades to
      text-only (markers stripped, surrounding text preserved) so the conversation continues.
    • New count_latest_user_image_markers() in zeroclaw-providers, the turn-scoped counterpart
      to count_user_image_markers(), reusing the same genuine-user-message predicate.
    • The user is still told once, on the turn they send the image; subsequent text turns recover.
  • Scope boundary: No change when a vision_model_provider is configured, no change to
    tool-result image degradation (already degraded), no change to the channel orchestrator path
    (which never persisted failed-turn images). No config / CLI / API / env surface change.
  • Blast radius: resolve_vision_provider is the single shared chokepoint inside
    run_tool_call_loop (one call site, turn/mod.rs), so the behaviour change reaches every
    transport (RPC/streaming, non-streaming, channels). The new function is purely additive.
  • Linked issue(s): None found (no existing issue).
  • Labels: bug applied. risk: and size: are auto-applied by the repo labeler
    (the path scope labels agent/provider/runtime are already on); the auto-risk rule
    classifies runtime-path changes as higher risk, so risk/size are left to the automation.

Validation Evidence (required)

Toolchain pinned to CI's 1.93.0.

cargo fmt --all -- --check                                   # clean
cargo clippy --all-targets -- -D warnings                    # clean (root; validate.sh)
cargo clippy -p zeroclaw-providers -p zeroclaw-runtime -p zeroclaw-channels --all-targets -- -D warnings  # exit 0
cargo test -p zeroclaw-providers -p zeroclaw-runtime
cargo test -p zeroclaw-channels vision

Tail output:

providers: test result: ok. 1022 passed; 0 failed
runtime:   2343 passed; 1 failed; 2 ignored
  count_latest_user_image_markers_scopes_to_newest_user_message ... ok
  run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider ... ok
  run_tool_call_loop_returns_structured_error_for_non_vision_provider ... ok   (preserved)
channels:  test result: ok. 6 passed; 0 failed
  e2e_failed_vision_turn_does_not_poison_follow_up_text_turn ... ok
  e2e_photo_attachment_rejected_by_non_vision_provider ... ok
  • Beyond CI - what I manually verified: Traced the RPC poison path (the streaming path
    persists the user message to the long-lived session history before the loop). Confirmed the
    single call site of resolve_vision_provider. Reproduced the fix end-to-end through the
    shared engine (carried-over image -> stripped to [media attachment], plain-text turn
    succeeds). Confirmed the first-turn capability error is preserved.
  • The one runtime test failure is unrelated:
    cron::store::tests::remove_job_emits_structured_cron_delete_event is a pre-existing
    test-isolation flake (a UUID assert against a process-global log broadcast; a sibling
    remove_job caller races its event in under in-process parallelism). The diff touches no
    cron code, the test passes in isolation, and CI's nextest (process-per-test) isolates it.
  • Intentionally skipped: the exact --features ci-all clippy combo (needs glib-2.0 /
    libudev system libs unavailable on this host). The change touches none of the voice/desktop
    features ci-all adds; deferred to CI. A docs-coverage heuristic WARN fired on the new
    pub fn, but it is an internal cross-crate helper, not a user-facing surface, so no docs are
    needed.

Security & Privacy Impact (required)

  • New permissions, capabilities, or file system access scope? No
  • New external network calls? No
  • Secrets / tokens / credentials handling changed? No
  • PII, real identities, or personal data in diff, tests, fixtures, or docs? No
  • The degrade path only strips media-marker references from text sent to a text-only provider;
    the image bytes never reach the model in either branch, and no trust boundary or policy check
    is affected.

Compatibility (required)

  • Backward compatible? Yes
  • Config / env / CLI surface changed? No
  • Upgrade steps: none. Behaviour change is strictly a bug fix: a previously-poisoned session
    now recovers on the next turn instead of failing permanently.

Rollback (required for risk: medium and risk: high)

  • Fast rollback command/path: git revert <sha> (single, self-contained commit; the new
    function is additive, so reverting cleanly restores the prior history-wide-count behaviour).
  • Feature flags or config toggles: None. (Operators wanting the prior routing can configure
    a vision_model_provider, which is unaffected by this change.)
  • Observable failure symptoms: grep logs for provider_capability_error with
    capability=vision, or the degrade WARN no vision route for carried-over/tool-result image marker(s); degrading to text-only. A regression would show the capability error re-firing on
    plain-text turns after an image, or an image the user just sent being silently dropped.

…image

Sending an image to a model_provider without vision support (and with no
vision_model_provider configured) raised a provider_capability_error AND
left the [IMAGE:] marker in the long-lived session history. The capability
error was triggered by a history-wide marker count, so every later turn,
even plain text, re-counted the stale marker and re-failed forever. The
RPC/streaming path makes this permanent: it persists the user message into
the session history before the loop runs, so a failed image turn leaves its
marker behind. A single image to a non-vision provider made the session
unusable until restart.

Scope the capability error to the most recent genuine user message:

- providers/multimodal: add count_latest_user_image_markers(), the
  turn-scoped counterpart to count_user_image_markers (skips tool-result
  carriers and older user messages).
- runtime/turn/vision_route: error only when the latest user message
  carries an image (the user just sent something we cannot see); a
  carried-over marker, from an earlier failed turn or a vision to
  non-vision model switch, degrades to text-only (markers stripped) so the
  turn continues. This also covers the model-switch case.

The user is still told once, on the turn they send the image; subsequent
text turns recover instead of re-failing.

Tests:
- providers: count_latest_user_image_markers_scopes_to_newest_user_message
- runtime: run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider
  (end-to-end through the shared engine; asserts the carried-over marker is
  stripped and the plain-text turn succeeds)
@github-actions github-actions Bot added agent Auto scope: src/agent/** changed. provider Auto scope: src/providers/** changed. runtime Auto scope: src/runtime/** changed. labels Jun 22, 2026
@Nillth Nillth added the bug Something isn't working label Jun 22, 2026
@Audacity88 Audacity88 added risk: high Auto risk: security/runtime/gateway/tools/workflows. size: M Auto size: 251-500 non-doc changed lines. labels Jun 22, 2026
@Audacity88 Audacity88 added this to the v0.8.3 milestone Jun 22, 2026

@singlerider singlerider left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review (no prior reviews/comments). Verified at head a9084c42 (CI green, MERGEABLE). Read the source; did not run local Cargo. All 🟢; approving. This fixes a real session-bricking bug at the right chokepoint.

🟢 The latest-vs-carried-over distinction is the correct fix

resolve_vision_provider (vision_route.rs:12) now branches on count_latest_user_image_markers (the newest genuine user message) rather than the history-wide count. The three arms are exactly right:

  • User just sent an unviewable image (latest_user_image_marker_count > 0, no vision_model_provider): surface ProviderCapabilityError so the attachment is not silently dropped.
  • Only carried-over/tool-result markers remain (else): degrade_strip_images = true, WARN once, continue text-only. This is what stops a single failed image turn from re-failing every later plain-text turn forever, which is the reported session-until-restart bug.
  • vision_model_provider configured: route to it; misconfigured non-vision route surfaces loudly.

🟢 No SSOT violation and the degrade path is non-destructive

count_latest_user_image_markers (multimodal.rs:293) reuses the same is_prompt_tool_result_message genuine-user predicate as count_user_image_markers; it is the turn-scoped counterpart, not a second source of truth. The degrade flag is threaded from resolve_vision_provider (turn/mod.rs:412) into prepare_messages_for_iteration (:439), which strips markers via strip_media_markers on a COPY of the outbound messages (vision_route.rs:138), so the long-lived session history is never mutated and the surrounding caption/metadata text survives. No filesystem path or data URI reaches the text-only provider.

🟢 Single chokepoint, attributed logging, honest disclosure

resolve_vision_provider is the one call site inside run_tool_call_loop (turn/mod.rs), so the behaviour change reaches every transport (RPC/streaming, non-streaming, channels) uniformly. The degrade WARN carries category/outcome/attrs (not a bare record!). Tests pin all three arms (count_latest_user_image_markers_scopes_to_newest_user_message, run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider, and the preserved run_tool_call_loop_returns_structured_error_for_non_vision_provider). The disclosed "1 failed" runtime test is the pre-existing cron::store test-isolation flake (process-global log broadcast race, passes in isolation, isolated by CI nextest), unrelated to this diff which touches no cron code; official CI is green.

Additive, risk:high handled correctly, scope tight (3 files). Approving.

@Nillth Nillth merged commit 451b15e into zeroclaw-labs:master Jun 23, 2026
19 checks passed
wangmiao0668000666 pushed a commit to wangmiao0668000666/zeroclaw that referenced this pull request Jun 23, 2026
…image (zeroclaw-labs#8180)

Sending an image to a model_provider without vision support (and with no
vision_model_provider configured) raised a provider_capability_error AND
left the [IMAGE:] marker in the long-lived session history. The capability
error was triggered by a history-wide marker count, so every later turn,
even plain text, re-counted the stale marker and re-failed forever. The
RPC/streaming path makes this permanent: it persists the user message into
the session history before the loop runs, so a failed image turn leaves its
marker behind. A single image to a non-vision provider made the session
unusable until restart.

Scope the capability error to the most recent genuine user message:

- providers/multimodal: add count_latest_user_image_markers(), the
  turn-scoped counterpart to count_user_image_markers (skips tool-result
  carriers and older user messages).
- runtime/turn/vision_route: error only when the latest user message
  carries an image (the user just sent something we cannot see); a
  carried-over marker, from an earlier failed turn or a vision to
  non-vision model switch, degrades to text-only (markers stripped) so the
  turn continues. This also covers the model-switch case.

The user is still told once, on the turn they send the image; subsequent
text turns recover instead of re-failing.

Tests:
- providers: count_latest_user_image_markers_scopes_to_newest_user_message
- runtime: run_tool_call_loop_degrades_carried_over_image_on_non_vision_provider
  (end-to-end through the shared engine; asserts the carried-over marker is
  stripped and the plain-text turn succeeds)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Auto scope: src/agent/** changed. bug Something isn't working provider Auto scope: src/providers/** changed. risk: high Auto risk: security/runtime/gateway/tools/workflows. runtime Auto scope: src/runtime/** changed. size: M Auto size: 251-500 non-doc changed lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants